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Abstract — Petabyte-scale distributed storage systems are cur- 
rently transitioning to erasure codes to achieve higher storage 
efficiency. Classical codes like Reed-Solomon are highly sub- 
optimal for distributed environments due to their high overhead 
in single-failure events. Locally Repairable Codes (LRCs) form 
a new family of codes that are repair efficient. In particular, 
LRCs minimize the number of nodes participating in single node 
repairs while generating small network traffic for repairs. Two 
large-scale distributed storage systems have already implemented 
different types of LRCs: Windows Azure Storage and the 
Hadoop Distributed File System RAID used by Facebook. The 
fundamental bounds for LRCs, namely the best possible distance 
for a given code locality, were recently discovered, but few explicit 
constructions exist. In this work, we present an explicit and 
simple to implement construction of optimal LRCs, for code 
parameters previously established only by existence results. For 
the analysis of the code's optimality, we derive a new result on 
the matroid represented by the code's generator matrix. 

I. Introduction 

Traditional architectures of large-scale storage systems rely 
on distributed file systems that provide reliability through 
block replication. Typically, data is split in blocks and three 
copies of each block are stored in different storage nodes. The 
major disadvantage of triple replication is the large storage 
overhead. As the amount of stored data is growing faster 
than hardware infrastructure, this becomes a factor of three 
in the storage growth rate, resulting in a major data center 
cost bottleneck. 

As is well-known, erasure coding techniques achieve higher 
data reliability with considerably smaller storage overhead |l j. 
For that reason different codes are being deployed in pro- 
duction storage clusters. Application scenarios where coding 
techniques are being currently deployed include cloud storage 
systems like Windows Azure [2|, big data analytics clusters 
{e.g., the Facebook Analytics Hadoop cluster pi), archival 
storage systems, and peer-to-peer storage systems like Clever- 
safe and Wuala. 

It is now understood that classical codes (such as Reed- 
Solomon) are highly suboptimal for distributed storage re- 
pairs [4|. For example, the Facebook analytics Hadoop cluster 
discussed in B] deployed Reed-Solomon encoding for 8% of 
the stored data. That portion of the data generated repair traffic 
approximately equal to 20% of the total network traffic. There- 
fore, as discussed in p), the main bottleneck in increasing 



code deployment in storage systems is designing new codes 
that perform well for distributed repairs. 

Three major repair cost metrics have been identified in the 
recent literature: i) the number of bits communicated in the 
network, i.e., the repair-bandwidth (4)-||9[ ii) the number of 
bits read, the disk-I/O |7|, [ 1 1 and Hi) more recently the 
number of nodes that participate in the repair process, also 
known as, repair locality. Each of these metrics is more 
relevant for different systems and their fundamental limits 
are not completely understood. In this work, we focus on 
the metric of repair locality, one that seems most relevant for 
single-location high-connectivity storage clusters. 

Locality was identified as a good metric independently by 
Gopalan et al. (TT), Oggier et al. |T2") , and Papailiopoulos et 
al. fl3) . Consider a code of total length n with k information 
symbols. Symbol i has locality r ; if it can be reconstructed by 
accessing at most r,- other code symbols. For example, in an 
(n,k) MDS code, every symbol has trivial locality k. We will 
say that a systematic code has information symbol locality r 
if all the k information symbols have locality r. Similarly, we 
will say that a code has all-symbol locality r if all n symbols 
have locality r. Codes that have good locality properties were 
initially studied in (14), p3) . 

In 1 11 1, a trade-off between code distance, i.e., reliability, 



and information symbol locality was derived for scalar linear 
codes. In [16 |, an information theoretic trade-off for any code 
(linear/nonlinear) was derived when considering all symbol 
locality. An (n, k) code with (information symbol or all- 
symbol) locality r has minimum distance d that is bounded 
as 

k 



d<n-k 



2. 



(1) 



Bounds on the code-distance for a given locality were also 
derived and generalized in parallel and subsequent works flT)- 

An (n,k,r) locally repairable code (LRC) is an (n, k) code 
such that any of its symbols can be reconstructed by accessing 
and processing at most r other symbols (all-symbol locality). 
Codes with all-symbol locality that meet the above bound are 
termed optimal LRCs and are known to exist when (r + 1 ) 
divides n (TT], (T6)-(T9). Explicit optimal LRC constructions 
for some code parameters were introduced in fT6)-pO). Some 
works extend the designs and theoretic bounds to the case 



where repair bandwidth and locality are jointly optimized 
under multiple local failures fl8) , fl9) and/or security issues 
are addressed [19]. The construction of practical LRCs is 
further motivated by the fact that two major distributed storage 
systems have already implemented different types of LRCs: 
Windows Azure Storage [2| and the Hadoop Distributed File 
System RAID used by Facebook [3|. As of now, designing 
LRCs with optimal distance for most code parameters n,k,r 
that are easy to implement was left as a new and exciting open 
problem. 

Our Contribution: We introduce a new explicit family 
of optimal (n, k, r)-LRCs. Our construction is optimal for 
any (n,k,r) such that r + 1 divides n. Our codes require 
0(klogn) bits in the description of each symbol and their 
main advantage is in their design. The codes are very simple 
to implement and are based on Reed-Solomon codes with 
added symbols that account for locality. The main theoretical 
challenge is in proving that they are optimal. This is done by 
first establishing a connection between the minimum distance 
and locality of a linear code and the matroid that it represents. 
This connection will provide a sufficient condition for optimal 
LRCs. Then it is shown using some properties of the determi- 
nant function and polynomials over finite fields, that the code 
generator matrix satisfies this condition. 

The most related works to ours are the two parallel and 
independent studies of [18] and fl9) . There, optimal LRC 
constructions for similar range of code parameters are pre- 
sented. Although these constructions rely on different tools and 
designs than the ones presented here, it would be of interest 
to explore further connections. 

The remainder of the paper is organized as follows. In 
Section|II] we present our code construction. In Section III 
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establish a precise formula of the minimum distance of a linear 
code in terms of the matroid represented by the generator 
matrix. In Section |IV] we use the established results and 
algebra of polynomials over finite fields to prove the optimality 
of our code. 

II. Code Construction 

In this section we construct an optimal (n,k,r)-LRC for the 
following sets of coding parameters 

r + 1 divides n (2) 
or n mod (r + 1) — 1 ^ k mod r > 0. (3) 

due to space limitation we prove its optimality only for the 
case where r + 1 divides n. Let m = njjpi and assume that 
1< r < fcQ 

Our construction is very simple: we take the output of an 
(m,A:)-Reed-Solomon code and for each r coded symbols we 
re-encode them into r + 1 symbols using a specific MDS code. 
This simple construction will be shown to have i) the desired 

'if r = k any (n,k) MDS code is an optimal (n,k,r = k) LRC. Moreover, 
if r = 1 we can show that since /" divides k then r + 1 = 2 has to divide 
n, i.e. n is even. The duplication of each symbol twice in an (n/2,k) MDS 
code will result in an optimal (n,k,r = 1) LRC. 
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(r + 1, r) - MDS code 
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(r + l,r) - MDS code 
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(r + l,r) - MDS code 



(n, k, r) - LRC 

Figure 1. A sketch of our (n,k, r)-LRC construction. We start with an 
(m = n^j,k) -Reed-Solomon code. We then re-encode the m Reed-Solomon 
coded blocks in the following manner: each consecutive group of r coded 
blocks is re-encoded using a specific (r + l,r) MDS code. The n outputs of 
the ~ local codes are the encoded blocks of our LRC. It should not be hard to 
see that the locality r can be trivially obtained by the local codes: if a block is 
missing the remaining r coded blocks in its group can be used to reconstruct 
it. Although our code as presented in this figure is not in systematic form, it 
can be easily done so by a simple transformation of the generator matrix. 



locality r and ii) optimal minimum distance. In Fig{T] we give 
a sketch of our construction. 

More formally, Let F„ be a field of size p ^ m, and 
consider a file that is cut into k blocks x = [x\, . . . , x%], where 
each block is an element of the field W^k+i- These k blocks 
are encoded into n coded-blocks y - 

x ■ G 



m,-,yn\ 



y 



where G is the generator matrix defined over the field exten- 
sion Fj|i. The construction of G follows. 

Construction 1 Let 0L\,...,0L m be m distinct elements of the 
field Fp, with p ^ m, and a be a primitive element of the 
field Fp/t+i. Also let V be a k x m Vandermonde matrix with 

its i-th column being equal to H\ = (1, a,-, a^ _1 ) f . Then, the 
generator matrix of the code is 

G = V-{l m/r ®A), 

where I s is the identity matrix of size s and A = (iji) is 
an r X (r + 1) matrix defined as follows: ones on the main 
diagonal, and a 's on the diagonal whose elements a,- j satisfy 

i ' l- 

An example of an (r + 1 



4, r = 3) A matrix is given bellow 



A 



1 a 
1 a 
1 a 



Notice that the matrix A serves as the generator matrix of the 
(r + 1, r) MDS code used in the second encoding step. This 
step provides the locality property of the code. 

Notice that G is not in systematic form: no k subsets of its 
columns form the identity matrix. However, there is an easy 
way to do so, by retaining the locality and distance properties: 
pick k linearly independent columns of G, say G^, and use as 
a new code generator matrix the matrix G sys = G^G. 

Theorem 1 The code generated by G has locality r and 



optimal minimum distance d = 

(r + 1) |n. 



2, when 



The proof of the above theorem is done in two steps. First, in 



Section III we derive a new result that expresses the minimum 
distance of a linear code using the matroid represented by its 
generator matrix. This result will imply that it is sufficient 
to verify that some subsets of k columns of G are full-rank. 
Then, in Section |TV] we show that these submatrices are indeed 
invertible, by using properties of the determinant function and 
polynomials over finite fields. 

III. Matroids and Locally Repairable Codes 

A. Overview of Matroid Theory 

We start with a quick overview of Matroid Theory. A 
matroid A4 — .M([n],rank(-)) is defined by the set of 
integers [n] = {1, n} and the rank(-) function, an integer 
valued funtion defined on all subsets of [n] that satisfies the 
properties: 

• rank(A) ^ 0, for any A C [n]. 

. rank(A) < for any A C [n]. 

• rank(A) ^ rank(B), for any sets A C B C [n]. 

. rank(AUB) +rank(AHB) < rank(A) + rank(B), for 
any sets ACBC [n]. 

A set A is called independent if rank(A) = \A\; otherwise 
A is called dependent. A set is referred as a circuit if it is 
dependent and all of its proper subsets are independent. This 
means that if c is a circuit, then rank(c) = \c\ — 1. 
Example: Let G be a k x n (e.g. a code generator) matrix 
over a field. Define the matroid _M([n],rankQ), where the 
rank of a set A C [n] is rank(A) = rank(G^), G^ is the 
sub-matrix of G with columns indexed by A and rank operates 
on a set of columns in the well-known linear-algebraic way. 
In this case, the matroid M. is represented by G. 

B. Connections to Code Distance 

A collection of sets C\,C2,... is said to have a non trivial 
union if every set is not contained in the union of the others, 
that is Cj (£. Uj^jCj, for any i. Using the above definitions 
we state a simple lemma that will be fundamental in our 
derivations. 

Lemma 1 Let C\,...,c m be m circuits in M.. If the circuits 
have a non trivial union, then rank(U™ =1 c ( ) $J | U^L-i C;| —m. 

Proof: We apply induction on m. For m = 1, since C\ 
is a circuit rank(cj) = \c\\ — 1. Let m > 1 and denote 



by c = U-^j q. By the property of the rank function 
rank(c U c m ) ^ rank(c) + rank(c,„) — rank(c n c m ). Since 
the union of the circuits is non trivial c D c m is a proper 
subset of c m and therefore is independent. Then by the 
induction assumption rank(c) + rank(c m ) — rank(c n c m ) $C 
\c\ — (m — 1) + \c m \ — 1 — \c n c m \ = \c U c m \ — m. ■ 
In what follows, we consider M to be the matroid that is 
represented by a code generator matrix G of size k x n. We 
will define a new parameter y relevant to the matroid M., 
which will be used later in calculating the minimum distance 
of the code generated by G. We would like to note that y can 



be defined also for non-representable matroids as well. We 
proceed with its definition and properties. 

Definition 1 Denote by y the minimum integer such that the 
size of every non trivial union of y circuits in A4 is at least 
k + y. 

Lemma 2 Let y be defined as above, then 

• y is bounded between 1 and n + 1 . 

• There are y circuits C\,...,c^ in M. whose union is 
nontrivial. 

Proof: Since there is no non trivial union of n + 1 circuits, 
the statement: any non trivial union of n + 1 circuits is of 
size at least k + (n + 1), is satisfied trivially, and therefore 
\i n + 1. On the other hand, since a union of only one 
circuit is clearly a non trivial union, we conclude that 1 }i. 
For the last part of the lemma, assume to the contrary that 
there are no \i circuits whose union is non trivial. Let ji' be 
the maximal integer such that there are }i' circuits Ci, c„/ 
whose union is non trivial. By the assumption y! < \i, there 
exist ji' circuits c\,...Cu< whose union is non trivial, and the 
size of the union is at most k — 1 



+- }i' , namely 



(4) 



By the maximality of ji' we conclude that U^ =1 c, = [n], 
otherwise there would be a non trivial union of fi' + 1 circuits. 

Hence, k = rank([n]) = rank(uf^ 1 c I -) < | \J? =l c f | - }i' < 
k — 1 + fi' — }i' = k — 1, where the first and the second 
inequalities follow from Lemma [T] and Q respectively. This 
leads as to a contradiction, hence there are ji circuits whose 
union is non trivial. ■ 
The next theorem is the main result of this section. It 
characterizes the properties of the locality and minimum 
distance of a code in terms of the circuits in its matroid. 

Theorem 2 Let G, A4 and }i defined has above. Then, 

1) the code has locality r iff each i = l,...,n is contained 
in a circuit of size at most r + 1. 

2) The distance of the code is equal to d = n — k — ]i + 2. 

Proof: 

1) This follows trivially from the definition of a circuit. 

2) If ji = 1, then by definition, the size of any circuit is 
of size at least k + 1. Hence any k columns of G are 
linearly independent and G is a generator matrix of an 
MDS code, namely d = n — k + 1. If y. ^ 2, then 
by the minimality of y there exist y — 1 ^ 1 circuits 
C\, c«_i whose union is non trivial and is of size at 
most k — 1 + y — 1 = k + y — 2. Hence by Lemma [T] 
rank(uf = T 1 1 c ! -) ^ | U^ 1 c, | - (y - 1) ^ k - 1. Let x 
be a nonzero vector of length k which is orthogonal to 
the columns of G with indices in u'' =1 c,\ Clearly such 
vector x exists since the rank of the union is at most 
k — 1. Then by the choice of x we get that x ■ G is 
a nonzero codeword of weight at most n — (k + y — 
2), and therefore also the minimum distance satisfies 



d < n — (k — ]l + 2). On the other hand, let T be the 
set of zero coordinates of some nonzero codeword from 
the code generated by G. Let S C T be a maximal 
independent set in T. Clearly the size of S is at most 
fc- 1. Let T\S = {h,...,ti}. We claim that / < y. -1. 
Assume the opposite, then for each z = 1, Z the set 
£, U S contains a circuit that contains f hence T contains 
at least / distinct circuits whose union is non trivial. 
From the definition of m we conclude that k — 1 ^ 
|S| = |S U ti U ... Ufy| - ]i ^ /c + p - y. = k, and 
we get a contradiction. The last inequality follows since 
S U t\ U ... U tu contains y circuits whose union is non 
trivial. Therefore the weight of the codeword is n — 
\T\ = n - (\S\ + \T\S\) ^ n — (k — l + y — l). Hence 
also the minimum distance is at least d ^ n — k — }i + 2, 
and the result follows. 

■ 

From Theorem |2j we get the following corollary which 
characterizes all optimal linear LRCs. 

Corollary 1 The code generated by G has locality r, and 
optimal minimum distance d = n — (k + |~|~| ) + 2 iff 

1) Any i = 1, ...,n is contained in a circuit of size at most 
r + 1. 

2) The size of any nontrivial union of circuits in M. 
is at least k + |"^~|. 

The previous corollary provided necessary and sufficient 
conditions for an optimal linear LRC. We derive from it 
another corollary which gives simple necessary conditions 
for optimal linear LRC. In what follows, we call a circuit 
nontrivial if its size is at most k. 

Corollary 2 Let G and M. as before and let 

1) all nontrivial circuits be of size at most r + 1 and let 
them form a partition of [l,n]. 

2) For any collection of nontrivial circuits c,- 

Then the code has locality r, and optimal minimum distance 
d = n-k- \'j]+2. 

IV. Optimality of Code Construction 

In this section we will prove Theorem 1. For i = 1, ...,m/ r, 
let Vj be the Vandermonde matrix of size k x r defined 
by the elements #i+ r (j_i)/ • £ Fp, namely V; = 

( oL T {i-l)+l *r(t-l)+2 •■■ ) • Then, we can rewrite the 
generator matrix G as 

G={V X -A,V 2 -A V m/r -A). 

It is easy to check that any proper subset of the columns of 
A are linearly independent and therefore A also generates an 
(r + 1, r) MDS code. This means that our code has locality r: 
any lost element can be repaired by accessing the r remaining 
elements that come from the same local code. 

We continue by showing the optimality of its distance. By 
the construction, for any i — l,...,n/(r + 1), the set of 



integers q = [1 + (z — l)(r + l),z(r + 1)] forms a circuit 
of size r + 1 in the matroid represented by G. We will show 
that these circuits are the only circuits of size at most k, 
and by Corollary [2] this will imply the optimality of the 
minimum distance. Let S be all the A:-subsets of \n\ that do 
not contain any circuit q, namely S = {s C [n] : \s\ = 
k and Cj <£. s for any i = 1, ^qrr}. For s £ S, denote by 
G s the square sub-matrix of G restricted to columns with 
indices in s. Showing that c\, —,c n /u+i) &e the only circuits 
of size at most k is equivalent to showing that the matrix G s 
is invertible for any s 6 S. This will be done by showing 
that the determinant of G s is a nonzero polynomial in a with 
coefficients in Fp, and degree at most k. However, since a 
is a primitive element of the field F the degree of its 
minimal polynomial in F p [x] is exactly k + 1. Therefore the 
determinant does not evaluate to zero in F k+i, i-e., G s is 
invertible for any s £ S, and the result will follow. To prove 
that, we first show a key property about the permanent of A. 
First, we extend the definition of a permanent of a square 
matrix to a non square matrix as follows. Let B = (bu) be 
an r x t matrix and t < r, then 



perm(B)= £ JlKi 

(fl V,),Vij=Vj 1=1 



(5) 



Intuitively, the permanent is the sum of all products of ele- 
ments in B, such that exactly one entry is picked from each 
column, and no two elements are picked from the same row. 
Note that throughout, the summation in Q is done over Z. 

Lemma 3 Let B be an r x t sub-matrix of A for r ^ t, then 
the permanent of B is a monic polynomial in a of degree at 
most t. 

Proof: Each of the products in |5]l is a product of exactly 
t elements of B. In addition, each element equals to a, 1 or 
0, hence the degree of each term is at most t. For the second 
part, note that B can be written as a block diagonal matrix 
with blocks Bi,...,B m , for some m. Hence the permanent of 
B is the product of the permanent of its blocks. Therefore the 
permanent of B is a monic polynomial, if the permanent of 
each block matrix is a monic polynomial. This fact can be 
easily verified, and the result follows. ■ 
For example, let B be composed of the columns 1,2,4 and 5 
of A of size 4x5 then 

/ 1 a 
d _ [ 10 
D — I a 

\0 1 a 

Hence B t = ( \ \ ) , and B 2 = ( c f ° ) . Moreover 
perm(B) = perm(Bi) ■ perm(B2) = 1 ■ a 2 = a 2 . 

Let G s be a sub-matrix of G for some s E S, and note 
that due to the structure of A each column in G s is a linear 
combination of at most two columns of the form a. For 
example, let r = 3, k = 6. Moreover, if G s is a 6 x 6 matrix 
composed of first three columns of V\ ■ A, first and third 
columns of V 2 ■ A, and the second column of V3 • A. Then 
G s = ( V1A1, V 2 A 2 , V3A3), where for each i = 1,2,3 the 
matrix A, is a sub-matrix of A. More precisely G s can be 
written as 



(A \ f 

Recall that the determinant function is linear on its columns, 
namely if u and v are column vectors, B is some matrix, and 
a., B are scalars, then det(a ■ v + B • u,B) = a ■ det(c, B) + B • 
det(w, B). Therefore the determinant of G s can be expanded 
into a linear combination of powers of a multiplied by deter- 
minants of Vandermonde matrices, i.e., it is a polynomial in a 
over Wp. The degree of the polynomial det(G s ) is two, and its 
leading coefficient equals to det(ai,a2, 03,^4, 5:5,07) 6 F p . 
This coefficient is non zero because all the a,'s are distinct. 
Note that there are other terms in the expansion that could 
potentially contribute a greater degree of a, such as 

« 4 • det(ai, a-[,a2,0i4,a5,ciy). (6) 

However, this term equals to zero since ai appears twice in 
(|6j. It is easy to see that in the expansion of the polynomial 
det(G s ) there is exactly one non zero term of the form c ■ a 1 
for some c E F p , hence, det(G s ) is a non zero polynomial in 
a of degree 2. 

In the general case, G s can be written as G s = 

(V h A t V h A t ) = (V h V it )-D(A 1 A t ) for some 

1 ^ t ^ m/r, and D = D{A\,...,At) is a block diagonal 
matrix with the matrices A; along its diagonal. Furthermore, 
each Aj is a sub-matrix of A. Clearly, perm(D) equals the 
product of the permanent of its blocks and its degree equals 
to the degree of det(G s ). Moreover, the coefficient of a 1 in 
perm(D) indicates the number of nonzero terms of degree 
i in the expansion of det(G s ). By Lemma [3] each of the 
polynomials perm(A,) is monic, hence also perm(D): there 
is only one non zero term in the expansion of det(G s ) with the 
largest degree of a. Therefore det(G s ) is a nonzero polynomial 
in a over F p . Moreover, D has k columns, therefore the 
determinant is a non zero polynomial of degree at most k. 
For the final step, since the minimal degree of a non zero 
polynomial over F p that annihilates a is k + 1, we conclude 
that det(Gs) is a non zero number in Fpjt+i, and therefore G s 
is invertible. 

V. Conclusions 

In this work we introduced a new family of optimal (n,k,r)- 
LRCs that are simple to implement. The codes are based 
on re-encoding Reed-Solomon encoded blocks for the added 
property of locality. To prove the optimality of our code, 
we first establish a connection between the minimum code- 
distance and properties on a matroid represented by the 
generator matrix of a code. We continue by showing that some 
subsets of the columns in the code generator matrix are full- 
rank. Our code construction is simple and requires a large 
finite field. This, however, does not seem to be a significant 
practical problem since each field element requires O(fclogn) 
bits to be represented. Explicit constructions of optimal LRCs 



for the case when r + 1 does not divide n and for small finite 
fields remain as open problems. 
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